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Local variation of hashtag spike trains and popularity in Twitter 
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We draw a parallel between hashtag time series and neuron spike trains. In each case, the process 
presents complex dynamic patterns including temporal correlations, burstiness, and all other types of 
nonstationarity. We propose the adoption of the so-called local variation in order to uncover salient 
dynamics, while properly detrending for the time-dependent features of a signal. The methodology is 
tested on both real and randomized hashtag spike trains, and identifies that popular hashtags present 
regular and so less bursty behavior, suggesting its potential use for predicting online popularity in 
social media. 
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INTRODUCTION 

The statistical properties of Twitter and, more gener¬ 
ally, of human activity, are characterized by a strong het¬ 
erogeneity in different dimensions. First, human behav¬ 
ior is known to generate bursty temporal patterns, sig¬ 
nificantly deviating from independent Poisson processes, 
as a majority of events take place over short time scales 
while a few events take place over very large times. This 
property translates into fat-tailed distributions for the 
timings At between occurrences of a certain type of 
events, e.g. between two phone calls or two emails emit¬ 
ted by an individual. For instance, the inter-event time 
distribution P(Ar) for the timings between two tweets 
of a user, or the use of a hashtag is well fitted by a 
power law such as P( At) a: At“ pQ. The deviation 
from an exponential (uncorrelated) distribution may be 
either driven by complex decision-making and cascading 
mechanisms EHi or by the time dependency of the un¬ 
derlying process, partly because of its intrinsic circadian 
and weekly rhythms 0[6], as described in Fig. 1, or by 
a combination of these factors jZHEI!. Importantly, the 
nonstationarity of the signal is known to broaden P(At) 
and therefore to artificially increase the value of standard 
metrics, such the variance or the Fano factor, originally 
defined for stationary processes. Recently, a stochastic 
model for a stationary process also suggests a broad dis¬ 
tribution in online user activity level on long time scales, 
longer than At TO.!. 

In addition to temporal heterogeneity in At, online 
human activity often generates a heterogeneity in popu¬ 
larity m- In the following, we focus on the popularity 
of hashtags in Twitter. Twitter is a micro-blogging ser¬ 
vice allowing users to post short messages, and to follow 
those published by other users. Messages often incor¬ 
porate hashtags, keywords identified by the symbol ff, 
which users can track and respond to the message content 
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FIG. 1. Circadian pattern of tweeting activity. Increasing 
amount of tweets from midday (12:00) to midnight (00:00) 
is shown in the yellow shaded regions. Significant decays of 
activity are observed during nights. Activity increases during 
mornings as shown in purple shaded rectangles. In the inset, 
we show the temporal evolution at a finer scale, where fluctu¬ 
ations are visible. The data exhibits two peaks of activity in 
the evening of a political debate, on May 2 2012 (first peak) 
and on election day, May 6 2012 (second peak). 


and makes the platform interactive. Hashtags play a sig¬ 
nificant role in information diffusion by enhancing infor¬ 
mation and rumor spreading and consequently increase 
the impact of news. Discussions on protests mm and 
political elections, advertisement of new products in mar¬ 
keting, announcements of scientific innovations PQ , panic 
events such as earthquakes E, and comments on TV 
shows are some examples where hashtags are widely used. 
Additionally, hashtags can be even used to track and lo¬ 
cate crisis m and can spread under the influences of 
both endogeneous factors, that is the propagation be¬ 
tween Twitter users following each others, and exoge- 
neous sources such as TV and newspapers m ■ 
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The popularity p of a hashtag is measured by the num¬ 
ber of times that it appears in an observation time win¬ 
dow. While a majority of hashtags attracts no atten¬ 
tion only very few of them propagate heavily B El- 
Understanding the mechanisms by which certain hash- 
tags or messages gain attention is a central topic of re¬ 
search in the study of online social media [T8]. Potential 
mechanisms for the emergence of this heterogeneity in¬ 
clude forms of preferential attachment and competition- 
induced forces fl9U22] driven by the limited amount of 
attention of users. 

Our main purpose is to explore connections between 
temporal and popularity heterogeneity. As a first con¬ 
tribution, we introduce a temporal measure for online 
human dynamics, suited for the analysis of nonstation¬ 
ary time series to quantify bursts, regularity, and tem¬ 
poral correlations. Originally defined for the study of 
inter-spike intervals of neurons [25H2T] . the so-called lo¬ 
cal variation Ly is shown to identify and characterize 
deviations from Poisson (uncorrelated) processes, and to 
help predict successful hashtags. 


DATA MINING AND BASIC ANALYSIS 
Data collection and basic overview 

The data set has been collected via the publicly open 
Twitter streaming API between April 30, 2012, 10 pm 
and May 10, 2012, 10 pm. Only the geographical con¬ 
straint has been applied as follows: The actions of all 
Twitter users located in France have been considered in 
order to avoid the existence of time differences between 
countries and regions, and no language filtering has been 
applied. The time resolution is 1 second and multiple 
activity can be recorded in the same second. During this 
time period, two major public events took place: An im¬ 
portant political debate held on May 2 and the French 
presidential election-2012 held on May 6. These events 
are not the topic of this work, but they are clearly visible 
in the time series, as shown in Fig. 1. 

The total number of tweets, including retweets, cap¬ 
tured during the data collection is 9,747,351. The to¬ 
tal number of tweets including at least one hashtag is 
2,942,239. Around 30% of the tweets therefore contain 
a hashtag. The fact that hashtags are used in regular 
tweets or in retweets is not specified. Moreover, any mes¬ 
sage (identical or not) considering at least one hashtag is 
recorded. Due to the debate and the election taking place 
during the data collection, the most popular hashtags are 
related to politics, as seen in Table 1. The time series of 
the hashtag study in this paper are provided in Support¬ 
ing Information SI. A total number of 473,243 individual 
users has been identified. Among those, 228,525 users 
published at least one hashtag, e.g. almost half of the 
social network is associated with hashtag diffusion. In or¬ 
der to further characterize the importance of hashtags in 
Twitter activity, we compare the total number of seconds 


when any action is performed in the data set, 763,262 s ss 
8.8 days and thus 88% of the total duration, to the num¬ 
ber of seconds when at least one hashtag is published, 
667,996 s ~ 7.7 days, that is 77% of the total duration. 
In any case, the hashtag data cover a majority of the 
time window, even during off-peak hours. These num¬ 
bers confirm the importance of hashtags in the Twitter 
ecosystem, and their prevalence in a variety of contexts. 


TABLE I. Ranking of popular hashtags. The first 40 
most used hashtags are listed with the corresponding popu¬ 
larity p. The hashtags related to the debate and the presiden¬ 
tial election such as ledebat, hollande, sarkozy, votehollande, 
france2012, and prsidentielle are recognized. 


rank 

hashtag 

popularity p rank 

hashtag 

popularity p 

1 

ledebat 

180946 

21 

ns 

18715 

2 

hollande 

143636 

22 

ps 

18492 

3 

sarkozy 

116906 

23 

teamfollowback 

18476 

4 

votehollande 

99908 

24 

ggi 

17734 

5 

radiolondres 

97622 

25 

bastille 

16056 

6 

bahrain 

71571 

26 

prsidentielle 

13799 

7 

fh2012 

67759 

27 

afp 

13710 

8 

avecsarkozy 

67549 

28 

france2 

12906 

9 

ledbat 

66668 

29 

Syria 

11594 

10 

ff 

49499 

30 

psg 

10566 

11 

ns2012 

40337 

31 

sarko 

10503 

12 

limp 

25125 

32 

tfl 

10201 

13 

thevoice 

24696 

33 

mutualite 

10093 

14 

fr 

24249 

34 

egypt 

9970 

15 

bayrou 

23029 

35 

lavictoire 

9949 

16 

fh 

22369 

36 

fn 

9763 

17 

rt 

21598 

37 

franceforte 

9626 

18 

france2012 

20635 

38 

placeaupeuple 

9211 

19 

reseaufdg 

19488 

39 

jemesouviens 

9098 

20 

france 

19268 

40 

bfmtv 

9010 


Any type of human activity is influenced by circadian 
and weekly cycles. This observation has been verified 
in recent years in a variety of social data sets, going 
from mobile phone JT] to online social media mm- 
addition, deviations from these cycles can help at de¬ 
tecting atypical events such as responses to catastrophes 
[2 na ng. Fig. 1 in the introduction shows the to¬ 
tal number of tweets per minute over a sub-period of 
6 days and confirms these findings, with clear circadian 
patterns and two peaks during major public events re¬ 
lated to the French presidential election-2012. Besides 
this smooth periodic behavior, the data also exhibits a 
noisy signal at a finer time scale, as shown in the inset of 
Fig. 1. In the following, we will analyze the properties of 
this complex time series, by decomposing it into groups 
of hashtags depending on their popularity, and uncover 
temporal statistical differences between these groups. 
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Heterogeneity in popularity of hashtags 

The success of a hashtag can be measured by its popu¬ 
larity p 7 defined as its number of occurrences, and equiva¬ 
lent to its frequency. Fig. 2 presents the Zipf-plot and the 
probability density function (PDF) of p, for the 295,697 
unique hashtags observed in the data set. The Zipf-plot 
[Fig. 2(a)] indicates that more than half of the hashtags 
(« 60%) appear just once in the data set, with p = 1. 
Moreover, around 83% of the hashtags have p < 5, in the 
pink-colored region in the last (right) rectangle of Fig. 
2(a). For moderate values of p , if we set a threshold of 
p to 1000 with an upper-bound to 25000, only 0.15% of 
the hashtags fit in the yellow-colored rectangle. Finally, 
top hashtags with p > 25000, in the red-colored rectan¬ 
gle, are very rare (« 0.0001%), but more frequent than 
would be expected for values so large as compared to the 
median. These observations are confirmed in Fig. 2(b), 
where we show the probability distribution of p 7 P(p) in 
a log-log plot. P(p) is a clear example of a fat-tailed dis¬ 
tribution associated with a strong heterogeneity in the 
system. 

The heterogeneity in p has been already observed [3] 
im hue], a mechanism proposed for its emergence is 
the competition between information overload and the 
limited capacity of each user [19U221 , sometimes coupled 
with cooperative effects mm- It has been also shown that 
hashtags having unique textual features become more 
popular than hashtags presenting common textual fea¬ 
tures [28]. In this paper, we are not interested in the 
origin of the heterogeneity, but in its relation with tem¬ 
poral characteristics of hashtags. 


HASHTAG SPIKE TRAINS 
Temporal heterogeneity 

We will draw an analogy between hashtag dynamics 
and neuron spike trains. To this end, we introduce stan¬ 
dard methods from spike train analysis into the field of 
hashtag dynamics. Hashtags are keywords associated to 
different topics, which can be created, tracked and reused 
by users. Their popularity and unambiguity make them 
an essential mechanism for information diffusion in Twit¬ 
ter. The statistical description of neuron spike sequences 
is essential for extracting underlying information about 
the brain [20]. It was originally believed that in vivo 
cortical neurons behave as time-dependent Poisson ran¬ 
dom spike generators, where successive inter-spike inter¬ 
vals are independently chosen from an exponential dis¬ 
tribution with a time-dependent firing rate [50] . How¬ 
ever, more recent observations have shown that the inter¬ 
spike interval distribution exhibits significant deviations 
from the exponential distribution, which has led to the 
construction of appropriate tools to describe neuron sig¬ 
nals [23H271 . 
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FIG. 2. Heterogeneity in the hashtag popularity p is shown 
in (a) Zipf-plot and (b) probability density function (PDF), 
P(p). (a) Diversity in p (frequency) is visible in a power-law 
scaling in the log-log plot. We rank hashtag from high p (left) 
to lowp (right). Different colored shaded rectangles highlight 
the value of p from red and orange (high p) to purple and 
pink (low p). The percentages describe the overall contri¬ 
butions of the corresponding rectangles, (b) Similarly, P(p) 
obeys a slowly decaying function and presents a power-law 
distribution with a fat tail. The same colored schema in (a) 
is applied to visualize the contributions of different values of 
P- 


Similarly, a hashtag spike train is defined as the se¬ 
quence of timings at which a hashtag is observed in Twit¬ 
ter. In this framework, we do not specify the type of 
dynamics of hashtags, endogeneous or exogeneous US], 
i.e. endogeneous, hashtag diffusion among members of 
the social network, or exogeneous, the diffusion driven 
by external factors such as TV and newspapers, but only 
in the timings. Each hashtag thus generates a unique 
hashtag spike train with a characteristic popularity p. 
As a first basic indicator, in Figs. 3(a,b) we show the 
inter-hashtag spike interval cumulative and probability 
distributions, CDF( Ar) and P(Ar), respectively. In or¬ 
der to avoid artificially deforming the distributions be¬ 
cause of heterogeneity in p, we classify CDF(At) and 
P(At) in classes depending on p, illustrated by different 
colors in Fig. 2. We observe similar behavior across the 
classes, as P(Ar) deviates strongly from an exponential 
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FIG. 3. The cumulative (a), CDF(At), and probability (b), 
P(At), distributions of the inter-hashtag spike intervals. We 
observe that P(At) exhibits, for different classes of hashtags 
distinguished by their popularity, non-exponential features. 
The different colors correspond to those in Fig. 2. The leg¬ 
end provides the average popularity ( p) in each hashtag class. 
The dash lines indicate the positions of 1 day, 2 days, and 3 
days, where P(At) gives peaks for low p (pink symbols). The 
binning is varied from 8 minutes to 2 hours depending on p, 
e.g. 8 min. for high p (red-orange), 1.5 hour for moderate p 
(yellow-green-blue-purple), and 2 hours for low p (pink). All 
P(At) present maxima at 1 second, which is not shown to 
describe tails in a larger window. 


distribution (Poisson), P(Ar) = £e _ ^ AT , where £ is a 
firing rate (frequency and so p in our concept) at which 
hashtags appear. Instead, we observe fat-tailed distribu¬ 
tions dlEHlllIl] as shown in Fig. 3(b) for high 
and moderate p. As mentioned in the introduction, this 
deviation may either originate from temporal correlations 
or non-stationary patterns, making the system different 
from a stationary, uncorrelated random signal. 


Real and randomized data sets 

We will analyze two sets of data, which we now de¬ 
scribe: The empirical data set, directly coming from the 
data, and a randomized data set, serving as a null model 
in our analysis. 

The real data set contains one spike train per hashtag, 
as illustrated in Fig. 4(a). The time resolution of the 
spikes is the same as that of the data set, that is 1 second. 
In situations when multiple spikes of the same hashtag 



FIG. 4. Real and artificial hashtag spike trains, (a) As an 
illustration of different hashtag spike trains representing dif¬ 
ferent types of hashtag propagation of the data set. (b) Merg¬ 
ing hashtag spike trains from the real data. The black spikes 
describe that only one activity is counted if multiple activi¬ 
ties occur at the same time, (c) Randomization procedure by 
randperm (Matlab). T contains full hashtag activity of the 
data set. The randperm gives a matrix p, unique independent 
numbers out of T, and constructing random time series ..., 
tI_ 1 , t[, t[ + i, ... from full hashtag activity matrix T. (d) 
The resultant artificial hashtag spike train. 


take place at the same time only one event is considered. 
The statistics of such events are provided at the end of 
this subsection. In each spike train, the appearance time 
of the spikes is ordered from the earliest time to the latest 
time. 

The random data set is randomized version of the real 
data set, where each spike train of size p generates a spike 
train of the same size with random times. In practice, 
we first combine all hashtag spike trains and obtain one 
merged hashtag spike train as illustrated in Fig. 4(b). 
This train carries the full history of all hashtags and, 
importantly, reproduces the nonstationary features of the 
original data in the presence of temporal correlations, 
burstiness, and the cyclic rhythm. As before, if two or 
more spikes generated in the same time, only one spike 
is shown in that time in the merged spike train, e.g. see 
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FIG. 5. The probability distribution of count of hashtag ac¬ 
tivity per second P(ch). We show that, except for the top 
most popular hashtags listed in Table 1 with ranking 1-11 
and presented here in red symbols, multiple activity in 1 sec¬ 
ond is very rare. The different colors correspond to those in 
Figs. 2 and 3. The legend provides the average popularity 
(p) in each hashtag class. 


the black spikes in Fig. 4(b). 

Randomization is performed by permuting elements, 
as shown in Fig. 4(c), for instance by using randperm(T, 
p) in Matlab. Here, T represents the full matrix of times 
in the merged spike train and p is the desired popularity, 
number of total spikes in a train. The permutation proce¬ 
dure generates p times uniformly distributed unique num¬ 
bers out of T and these numbers define the artificial spike 
train, e.g. ..., t[_ x , t[, t[ +1 , ..., as shown in Fig. 4(c). 
In our data set, p <C T is always verified, as the maximum 
p is 180,900 and the length of T is 667,996. This proce¬ 
dure is applied to each spike train of size p [Fig. 4(d)]. 
Generating independent, yet time-dependent events, the 
procedure is expected to create time-dependent Poisson 
random processes, P(At, t) = £(t)e ~^ Ar , where the fir¬ 
ing rate £(i) in this case explicitly depends on the time 
of the day and of the week. 

Statistics of multiple tweets in 1 second. We detect 
multiple occurrences in 1 second for 6661 hashtags. Fig. 
5 presents the probability distribution P{ch ) of observ¬ 
ing C/j, occurrences of an hashtag during one second, for 
different hashtag popularity class. Even though Ch > 1 
occurs rarely, we observe that this possibility is more 
probable for popular hashtags (red open circles), as ex¬ 
pected. For the most popular hashtag, ledebat, one finds 
max(ch ) = 40. 


LOCAL VARIATION 


The time series of spike trains are inherently nonsta¬ 
tionary, as shown in Fig. 1. For this reason, metrics de¬ 
fined for stationary processes are inadequate and might 
lead to incorrect conclusions. For instance, the non¬ 
exponential shapes of the inter-event time distribution 
P(At) in Fig. 3 might originate either from correlated 
and collective dynamics, or from the nonstationarity of 
the hashtag propagation. Similarly, statistical indicators 
based on this distribution, such as its variance or Fano 
factor, might be affected in a similar way. For this reason, 
we consider here the so-called local variation Ly, origi¬ 
nally defined to determine intrinsic temporal dynamics 
of neuron spike trains [23U27] . 

Unlike quantities such as P(Ar), Ly compares tem¬ 
poral variations with their local rates and is specifically 
defined for nonstationary processes [27] 


3 y- 1 f (r i+ 1 - Tj) - (n - Tj_i) \ 2 

iV-2 ^ V(p+ i - n) + (n - Tj_i )) 


(1) 


Here, N is the total number of spikes and ..., Tj_i, Tj, 
7*+ 1 , ... represents successive time sequence of a single 
hashtag spike train. Eq. |T] also takes the form m 


3 ( Ar^ + i — A Tj 

N - 2 V Ar l+ i + A n 


2 


( 2 ) 


where Arj + i = Tj + i — Ti and At; = Tj — Tj_i. Arj +1 quan¬ 
tifies forward delay and Arj represents backward waiting 
time for an event at Tj. Importantly, the denominator 
normalizes the quantity such as to account for local vari¬ 
ations of the rate at which events take place. By defini¬ 
tion, Ly takes values in the interval [0:3]. 

The local variation Ly presents properties making it 
an interesting candidate for the analysis of hashtag spike 
trains fITrfIT] . In particular, Ly is on average equal to 
1 when the random process is either a stationary or a 
non-stationary Poisson process [23] . with the only con¬ 
dition that the time scale over which the firing rate f(t) 
fluctuates is slower than the typical time between spikes. 
Deviations from 1 originate from local correlations in the 
underlying signal, either under the form of pairwise corre¬ 
lations between successive inter-event time intervals, e.g. 
ATj + i and ATj which tend to decrease Ly, or because 
the inter-event time distribution is non-exponential. An 
interesting case is given by Gamma processes |23l [25] 

P(At, t- £, k) = (^k) k AT^- 1 >e-« KAr /F(«;) (3) 


where n is called a shape parameter and determines the 
shape of the distribution and T is the Gamma function. 
Here, £ and n are the two parameters of the Gamma 
process. While £ determines the speed of the dynamics, 
n controls for the burstiness (irregularity) of the spike 
trains. Assuming that events are independently drawn, 
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the shape factor is related to Ly as follows [231G2] 


Here, the brackets describe the average taken over the 
given distribution [23] . When k = 1, an exponential is 
recovered, and one finds ( Ly } = 1 as expected. Smaller 
values of k increase the variance in At and therefore its 
burstiness, making Ly larger than 1. On the other hand, 
larger values of n decrease the variance of At and the 
burstiness of the process, making (Ly) ss 0 smaller than 
1. 

We measure Ly of hashtag spike trains and group the 
values depending on the popularity p of their hashtag as 
was done in Figs. 2 and 3. Fig. 6 shows scatter plots 
of Ly for the real data set (a), the empirical sequence 
..., Tj_i, Tj, Tj+i, ..., and the random data set (b), the 
random sequence ..., t[_ 1 , t[, t,[ +1 , ..., on linear-log 
plots. Different colors are used to distinguish the dif¬ 
ferent groups and the inset legend provides the average 
popularity (p) in the groups. 

A more readable representation is provided in Fig. 7, 
where we show histograms P(Ly) of the values of Ly, 
for the two data sets and for the distinguished hashtag 
groups in p. The results clearly show that Ly fluctu¬ 
ates around 1 in the random data set, as expected for a 
time-dependent Poisson process. On the other hand, Ly 
systematically deviates from 1 in the original data set, 
where temporal correlations are present. 

The observation is confirmed in Fig. 8(a), where we 
plot the mean p(Ly) of Ly, with error bars, as a func¬ 
tion of (p). Furthermore, Ly of the original data in¬ 
dicates that high impact hashtags (high p) are associ¬ 
ated with lower values of Ly suggesting more homoge¬ 
neous (regular) time distributions. These results confirm 
the potential use of Ly as a metric to capture devia¬ 
tions from Poisson (temporarily uncorrelated) processes, 
but also to identify distinct statistical properties gener¬ 
ated specifically in high p. Moreover, Fig. 8(b) presents 
the statistical differences between the real and the ran¬ 
dom spike trains in detail. The deviations from Pois¬ 
son processes where po(Ly) = 1 are calculated by z = 
p,(Ly)—po(Ly)/a(Ly)/ y/n with the standard deviations 
of Ly, cr(Ly), and the number of the data points given in 
the distributions in Fig. 7, n. We observe that values 
for the random spikes are almost equal to 0, excluding in 
high p, indicating the agreement between Poisson signals 
and our random spike trains, which is not the case for 
the real trains giving z ^ 0 in any of (p). 

To conclude, we perform an analysis to test the per¬ 
sistence of the temporal characteristics of hashtags, as 
measured by Ly, through time. To do so, we divide each 
hashtag time series into two time series. The resulting 
values of local variations are Ly(t\) for the first half of 
a spike train and Ly(t 2 ) for the second half of the train, 
and then we calculate the Pearson correlation coefficient 
r(L v (ti), Ly(t 2 )) between these values [31]. In Fig. 9(a) 
we show the linear relation between Ly(t\) and Ly(t 2 ). 


Fig. 9(b) shows r(Ly(t\),Ly(t 2 )) as a function of the 
average popularity (p). Both indicate that values of Ly 
for the same hashtag at different times is significantly 
and temporarily correlated. We also observe that bursty 
(low p) and regular (high p) signals give small r, while the 
spike trains with moderate p provide the largest values 
of r where Ly suggests more uniform temporal behavior 
through the individual trains. 


DISCUSSION 

The main purpose of this paper is to introduce a sta¬ 
tistical measure suitable for the analysis of nonstationary 
time series, as they often take place in online social me¬ 
dia and communications in social systems. As a test case, 
we have focused on the dynamics of hashtags in Twitter. 
However, the same methodology could be also applied to 
the other types of correlated, bursty, and nonstationary 
signals, for instance the dynamics of cascades in Twitter 
and Facebook or phone call activity. 

Instead of measuring standard statistical properties of 
noisy hashtag signals such as the inter-event time dis¬ 
tribution, its variance or the Fano factor, convention¬ 
ally applied to characterize the burstiness of a signal, we 
have focused on the local variation Ly, a metric captur¬ 
ing the fluctuations of the signal as compared to a local 
characteristic time. This measure, previously defined for 
neuron spike train analysis, nicely uncovers the regular¬ 
ity and the firing rate of the trains [ 231 - 127 ] and so helps 
to identify local temporal correlations. It is important 
to stress that the current analysis exclusively focuses on 
properties of time series, and does considers neither the 
mechanisms leading to the observed statistical dynamic 
properties nor the effect of the underlying topology, e.g. 
through following-follower relations. An interesting line 
of research would study the relation between Ly, the un¬ 
derlying topology [35] and diffusive models, for instance 
Hawkes process mm- In addition, both neurons 30 
and hashtags can be driven by multiple firing rates and 
Ly analysis associated to Gamma distributions would 
provide more concrete results on hashtag spike trains, as 
done for neuron spikes [251 . 

We should also note that the finite temporal resolu¬ 
tion of the data (1 sec), and the fact that multiple events 
per time window are neglected, tends to make Ly ar¬ 
tificially decrease for popular hashtags. In an extreme 
case, the time series is indeed regular, with events taking 
place every second. In this work, we have therefore care¬ 
fully verified that fluctuations in Ly are not artificially 
driven by these limitations. To do so, we have compared 
the values of Ly in the empirical data with those of a 
null model. We observe a small decay of Ly for popular 
hashtags in the null model (see Fig. 8), but this decay is 
much more limited than the one observed in the empiri¬ 
cal data, e.g. Ly = 0.89 for (p) = 10 5 in the null model 
while it is equal to Ly = 0.54 for the real data. In addi¬ 
tion, a decay of Ly in real hashtag data is also present in 
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FIG. 6. The local variation Ly of hashtag spike trains versus popularity p on a linear-log plot. Each color and symbol 
summarized in the legend present different range of p\ Low p, pink and purple colors, and moderate p, blue, green, and yellow 
colors, and then high p, orange and red colors. In addition, the average p, (p) , indicated in the legend ranks colors and symbols 
quantitatively, (a) Hashtag spike trains of the data set. (b) Artificial hashtag spike trains. 


moderately popular hashtags, where multiple events per 
second are very rare. An interesting research direction 
would be to generalize the definition of local variation 
in order to allow for the analysis of multiple events per 
time window, thereby evaluating the deviations of dense 
time series to non-stationary Poisson processes. Finally, 
in a finite time window, as observed in empirical data, 
the statistics of high frequency hashtags is much better 
than that of low frequency hashtags, simply because the 
former occur many more times than the latter. For this 
reason, measurements of Ly for low popularity hashtags 
are more subject to noise. 

The empirical analysis also reveals an interesting pat¬ 
tern observed in the data, as more popular hashtags tend 
to present a more regular temporal behavior. This lack of 
burstiness ensures that hashtags do not disappear from 
the social network for very long periods of time, thereby 
allowing for a regular activation of the interest of Twit¬ 
ter users. These findings are reminiscent of recent obser¬ 
vation in numerical simulations showing that burstiness 
hinders the size of cascades [35], and should be incor¬ 
porated into the modeling of theoretical information dif¬ 
fusion models, in particular threshold [39] and stochas¬ 


tic m models, on temporal networks but also into the 
ranking models capturing online heterogeneity in the em¬ 
pirical data [HI- 
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